When Multiwords Go Bad in Machine Translation
نویسندگان
چکیده
This paper addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct reasons, multiwords remain a problematic area for MT independently of the approach, and require adequate linguistic quality evaluation metrics founded on a systematic categorization of errors by MT expert linguists. We propose an empirically-driven taxonomy for multiwords, and highlight the need for the development of specific corpora for multiword evaluation. Finally, the paper presents the Logos approach to multiword processing, illustrating how semantico-syntactic rules contribute to multiword translation quality.
منابع مشابه
Machine Translation of Non-Contiguous Multiword Units
Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an...
متن کاملCLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units
Currently available alignment tools and procedures for marking-up alignments overlook non-contiguous multiword units for being too complex within the bounds of the proposed alignment methodologies. This paper presents the CLUE-Aligner (Cross-Language Unit Elicitation Aligner), a web alignment tool designed for manual annotation of pairs of paraphrastic and translation units, representing both c...
متن کاملComputing Transfer Score in Example-Based Machine Translation
This paper presents an idea in Example-Based Machine Translation computing the transfer score for each produced translation. When an EBMT system nds an example in the translation memory, it tries to modify the sentence in order to produce the best possible translation of the input sentence. The user of the system, however, is unable to judge the quality of the translation. This problem can be s...
متن کاملUniversal Words and their relationship to Multilinguality, Wordnet and Multiwords
In this article we address issues concerning construction of lexicon in the context of sentential knowledge representation in Universal Networking Language (UNL), an interlingua proposed in 1996 for machine translation. Lexical knowledge in UNL is in the form of Universal Words (UWs) which are concepts expressed by mostly English words disambiguated and stored in the universal words repository....
متن کاملDetecting Hidden Multiwords in Bilingual Dictionaries
Dictionaries are a valuable source of information about multiwords. Unfortunately, only few multiwords are explicitly marked as such in dictionaries: most of them are presented without being distinguished from free combinations of words. In this paper we present a methodology for detecting hidden multiwords in bilingual dictionaries, along with their translation in another language. The methodo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013